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Abstract 

A numerical method is proposed to approximate the inverse of a general bi-Lipschitz nonlinear dimensionality 
reduction mapping, where the forward and consequently the inverse mappings are only explicitly defined 
on a discrete dataset. A radial basis function (RBF) interpolant is used to independently interpolate 
each component of the high-dimensional representation of the data as a function of its low-dimensional 
representation. The scale-free cubic RBF kernel is shown to perform better than the Gaussian kernel, 
as it does not require the difhcult-to-choose scale parameter as an input, and does not suffer from ill- 
conditioning. The proposed numerical inverse is shown to be mathematically similar to the eigenvector 
interpolation known as the Nystrom method, a commonly used numerical method for rapid approximation 
of eigenvectors of a dense weight matrix. Based on this observation, a critique of the Nystrom method is 
provided, with suggestions for improvement. 

Keywords: nonlinear dimensionality reduction, graph Laplacian, radial basis function, interpolation, 
Nystrom method 



1. Introduction 

Generation of low dimensional representations of data in high dimension has recently become an area of 
intense research. Many nonlinear methods are available such as kernel Principle Component Analysis pQ, 
Laplacian Eigenmaps j2j, and Local Linear Embedding [3], among many others. A significant limitation of 
many nonlinear dimensionality reduction methods is that they are only defined on a discrete set of data. As 
a result, the inverse mapping is also only defined on the data. There are well known strategies to extend 
the forward map to a new point — for example the Nystrom extension. However, the problem of extending 
the inverse map has received little attention so far (but see [HGEIS]). 
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We present a method to numerically invert a general smooth bi-Lipschitz nonlinear dimensionality reduc- 
tion algorithm such as Laplacian Eigenmaps over all points in the image of the forward map. The method 
relies on interpolation via radial basis functions in high dimension, modeling a smooth manifold in high 
dimension as a function of the data embedded in a lower dimensional space. A scale-free alternative to 
the commonly used Gaussian radial basis function kernel is demonstrated to provide computationally stable 
results. 

The proposed numerical inverse is shown to be mathematically similar to the eigenvector interpolation of 
the Nystrom method, a commonly used numerical method for rapid approximation of eigenvectors of a dense 
weight matrix. Based on this observation, a critique of the Nystrom method is provided, with suggestions 
for improvement. 

There are two notable contributions of this paper. The first is the introduction of a numerically-stable 
scale-free approach to the problem of interpolation of data in high-dimension as a function of its low- 
dimensional representation. The second is the unambiguous interpretation of the Nystrom method as a 
radial basis function interpolant, and the subsequent observations about the stability of the method. 

2. Inverse Mapping 

The problem is posed as follows: {a^ 1 ), . . . , x^} c R D , lie on a bounded low-dimensional smooth 
manifold Mel". The data are embedded in R d via a nonlinear mapping 

*„ : R D -> R d ,x^ h-> *„(a;( 1 )) for i = l,...,n. (1) 

We assume the existence of an underlying continuous operator, 3> : A4 — > <&(.M), that provides an extension 
to <!?„. In the specific case of Laplacian Eigenmaps, for example, the Graph Laplacian converges pointwise 
to the Laplace Beltrami operator on the manifold, and one can show the convergence of the associated 
eigenvectors: lim *& n (x) = $>(x), for all x € M. [7]. 

Like Laplacian Eigenmaps, most common nonlinear dimension reduction algorithms only provide an 
explicit mapping for a discrete dataset. Therefore, the inverse mapping is also only defined on the 
data. The goal of the present work is to generate a numerical extension of €>^ 1 to all of 3>(.M) C K d , 
the image of the corresponding continuous operator. To simplify the problem, we assume the mapping 
provided by our dimension reduction algorithm coincides with the limiting continuous operator on the data, 
i.e. $„(xW) = <&(a;W) for i = 1, ...,n. This notation allows us to discard the notation to denote 
the data-specific mapping. Instead, we use the notation <& to simultaneously denote both the discrete and 
continuous mapping. With this notation, we propose a method to construct an approximate inverse 

* f : <f>(M) -> R D , (2) 

such that lim &\y) = for all y e &(M). 

n— >oo 
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Owing to the connections between the method proposed below and the Nystrom extension, we use 
notation specifically suggestive of a dimension reduction method involving a spectral decomposition of the 
weight matrix of the data in M. D , which is used for instance in Laplacian Eigenmaps, LLE, ISOMAP, etc. 
However, the numerical extension method has much broader application, and may be applied to numerically 
invert any smooth bi-Lipschitz embedding of a dataset in M. d . 

We denote the weight matrix W, where Wij = k{x^' , x^> ) is a measure of the affinity between data points 
x^ and x^> G R D . A typical measure is the Gaussian: fc(iW,iW) = exp(— e 2 ||a;W — £c^|| 2 ). As noted 
above, we are specifically interested in dimension reduction methods that involve a spectral decomposition 
of W. Thus we consider the eigenvalue problem 

Wfa = Xi4> u (3) 

where (pi and A; are the eigenvectors and corresponding eigenvalues of W for i — l,...,n. For A = 
diag(Ai, . . . , A„) and $ = . . . , <f> n ] G ]R™ X ™, we consider the full eigenvector decomposition of the weight 
matrix W: 



W = <M$ T . (4) 

We define the mapping 0; : R D — > M. as follows: <j)i(x^) — tfin, where <pu is the i, I entry of the eigenvector 
matrix $ £ M. nxn . Using this notation, 

( f> l = M^ 1) );---,M^ n) )} T , (5) 

and we naturally choose to denote the spectral embedding of xW g JVl c R D as 

*(ajW) = [<M*«), • • • , Mv [l) )] e *{M) c R d . (6) 

(note: typically d << n, and dimension reduction is achieved using a partial eigenvector decomposition). 

Restating the problem: we would like to generate an approximate inverse : €>(A / J) — > M 15 that will 
converge to the true inverse $ 1 1 : $(A / J) — > in the limit at n — > oo. For notational simplicity, we will 
introduce the following notation for the data embedded in M. d : 

y = *(x) and y« = *(aj(0). (7) 

We do not consider how the new point y S $(A4) is generated, but simply assume that there exists an 
x e Ai C M 15 such that &(x) — y. Methods exist for approximating a new point y in the image of 
(e.g. the Nystrom extension [8 ). However, in the present work we focus our attention on the problem of 
approximating the inverse map, and thus, we assume that we can accurately generate a point y in the range 
of *. 
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2.1. Linear Inverse Mapping 

One method proposed for this type of inverse mapping was proposed by [61 . The authors in [6i approxi- 
mate x — «&~ 1 (y) with <&^(y), defined by linearly interpolating the existing coordinates: 

#t(„)= J2 <h* a) , ( 8 ) 

where each x^ is an original point that is mapped to a neighbor yW> = <&(a;^) of y. Here, H y denotes 
the set of neighbors of y in Mr. The interpolation coefficients are calculated as follows, 

exp(-||y-yO-)|| 2 Ar) 
' 3 E^co^expf-Htf-yWHVa)' 1 j 

where a is chosen to be the distance between y and its nearest neighbor. The algorithm is justified by the 

assumption that the interpolation weights should be similar between the two spaces R D and R d . In fact, 

if we consider each point € K D to be a function of its coordinates yV> € M d , we observe that this 

algorithm is more commonly known at Shepard's method [S], a moving least squares approximation method. 

This method is optimal in the following way: at any location y, the function is approximated by the 

constant function that minimizes the sum of squared errors within a neighborhood Af y , weighted according 

to their proximity to y. In [3], the Gaussian is used as the weight function for Shepard's method. 

The first observation is that, despite the similar appearance to a radial basis function (RBF) interpolant 



(see e.g. Eq (10)), the algorithm does not produce an RBF interpolant. Instead, the algorithm uses a set 
of linear weights that may not even reproduce the new point y as a linear combination of its neighbors in 
M. d . Another key observation about this inverse mapping method is that the weights Cj are all positive and 

c j = 1- As a result, the interpolation is restricted to within the convex hull of the neighbors. 

The limitations of Shepard's method [5J are illustrated in figure [l] Here the performance of Shepard's 
method is compared to the algorithm discussed in this paper. For illustrative purposes, the data are scattered 
on the unit circle in M. D = M 2 , and mapped to M d = M 2 via Laplacian Eigenmaps. Shepard's method, as well 
as our algorithm, are tested on a known point. We observe that restricting interpolation to within the convex 
hull of the neighbors limits the algorithm's ability to account for curvature. Shepard's method is tested for 
varying neighborhood sizes (20, 10, or 5 nearest neighbors), and is found to perform better with fewer 
nearest neighbors, as the convex hull of the neighbors become more-localized near the true point. However, 
the method still fails to honor the curvature of the manifold. Our approach performs well, matching the 
curvature of the manifold. 

Provided sufficient data to characterize a curved manifold are available, we believe that a good ap- 
proximate inverse should be capable of reproducing curvature. A natural way to approach the problem 
of extending the inverse map, while accounting for curvature, is via interpolation. To accomplish this, we 
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Figure 1: Comparison of Shepard's method 6j with cubic RBF method, for 20 points randomly scattered on the unit circle in 
R 2 . Shepard's method can only interpolate within the convex hull of the neighbors, thus the algorithm fails to match curvature. 
Shepard's method is tested for 20, 10, and 5 nearest neighbors (NN). 

consider x £ M D to be a multivariate function of y g R d , subject to the constraints ajW = 4> _1 (y(^), 
for i = l,...,n. Most interpolation strategies are directly applicable to univariate functions. Thus, we 
independently interpolate each coordinate in M. D as a function of all of the coordinates in E d . To simplify 
the present analysis, we assume the data are free of noise. The approximation problem is not considered 
here, and will be the subject of future work. We only mention that the proposed interpolation algorithm 
is easily generalized to provide an approximation strategy that can accommodate a noisy representation of 
a manifold. Interpolation with RBFs has the advantages of being algorithmically straightforward in any 
dimension, and can easily account for curvature. To this end, we begin the next section with a basic intro- 
duction to radial basis function interpolation. Abundant literature exists on the topic for the curious reader 
(e.g. [HI HOI IXX) provide surveys at varying levels of detail). 

The remainder of the paper is organized as follows. Section [3] introduces the RBF inverse mapping 
algorithm. Section [4] considers convergence of Gaussian RBF interpolants, followed by section [5] which makes 
observations about problems arising from ill-conditioning of the interpolation matrix. Section [6] introduces, 
motivates, and discusses convergence of an alternative scale-free cubic RBF kernel. Section [7] demonstrates 
performance of the inverse mapping algorithm on several example problems. Section [8] contains a discussion 
and novel interpretation of the Nystrom extension, followed by a final summary in section [9] 

3. Radial Basis Function Inverse Mapping 

Before introducing our inverse mapping algorithm, we define a basic RBF interpolant. For the RBF kernel 
k : R d x R d — > K, and the node set {z^\ . . . , z^ n '} with corresponding function values {/(z^), . . . , f(z^)}, 
the RBF interpolant takes the form 
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s(z) = J2^ (j) k(z,z^). (10) 

3=1 

The weights, {a^\ . . . , a*™- 1 }, are determined by solving the following system of equations: 
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which enforces the condition s(z^) = f(z^), for i = 1, . . . , n. 

We propose a method to compute the inverse mapping, which better honors the curvature of the underly- 
ing manifold by interpolating using RBFs. Similar methods have been explored in jHEJH^] to interpolate data 
on a low-dimensional manifold. In our case, we generate a D-dimensional RBF interpolant: & : R d —> M. D . 
To accomplish this, we simultaneously generate D independent RBF interpolants to each coordinate in M. D . 

Let K denote the RBF kernel matrix for the data embedded in R d : Kij = k(y^\y^). In [BJ, the authors 
assume that K w W by using the same kernel in R d and R D . In fact, this assumption is not needed in this 
work, and we may consider various RBF interpolation kernels, not only the Gaussian (which is typically 
used to generate the weight matrix in M. D ). In order to generate the radial basis function interpolant, we 
begin by solving the system 



KA = X, 



(12) 



where the i-th row of X g M™*- are the D coordinates of xW e R D . The j-th column of A £ 
the interpolation coefficients for the RBF interpolant of the j-th dimension in M. D . In detail, 
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(13) 



where cr- for i — 1, . . . ,n are the RBF interpolation coefficients for the j-th dimension in R D . The new 



point x = 3? (y) is interpolated using (10), which can be written in matrix form as 



*t(y) T = k(y, -) T A = k(y, -fK^X, (14) 

where k(y, ■) = [k(y, y«), k(y, y^)] T . 

For non-coincident interpolation nodes {y^ x \ ■ ■ ■ , y^} €E R d , and a Gaussian kernel k we are guaranteed 
K is non-singular For large data sets, the interpolation need not depend on all the data, as the RBF 
interpolant is only sensitive to points in the neighborhood of interest. The interpolation algorithm may be 
implemented locally, providing immense savings in computation time. 
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4. Convergence of the Interpolant 

As we consider interpolation of the data using RBFs, three questions must be addressed: 1) Given a 
set of interpolation nodes, is the interpolation matrix necessarily non-singular? 2) What types of functions 
can be approximated? 3) What convergence rate can we expect as we populate the domain with additional 
nodes? 

Convergence criteria for RBF interpolants have been studied extensively in the literature. For a detailed 
treatment, see [9j[TT|. In this section we briefly introduce the necessary concepts to understand convergence 
of Gaussian RBF interpolants. Before considering the questions posed above, we first study the expected 
smoothness of the embedding <I> and its inverse. 

4-1- Properties of the Inverse of the Graph Laplacian 

The data {x^\ . . . , x'"'} are a discrete representation of a bounded smooth manifold A4 C M. D . We 
model the manifold M. as afunction of the eigenvectors of the graph Laplacian: $>(x) = {<fti(x), . . . , (f)d{x)] T . 
Thus, the inverse mapping ffr^ 1 : &(A4) — > M. D will inherit properties from the forward mapping 3? : Ai — > 
M. d . We recall that <pi is an eigenvector of the graph Laplacian matrix, which is a discrete approximation 
of the Laplace Beltrami operator on the manifold M.. Thus, in the continuous setting the mapping <I> : 
M — s- &(M) C R d must be infinitely differentiate, implying that <& 1 : <&(M) — > R D is also infinitely 
differentiable. By assumption, A4 is a bounded subset of M. D . Provided that 3? is bi-Lipschitz, then $>(A4) 
is a bounded subset of R d . As a result, the components of 3> _1 are in C°°(&(M)). 

4- -2. Properties of Gaussian RBF Interpolants 

To consider the questions posed at the beginning of this section, we must first introduce several definitions. 
We begin with the definition of a positive definite kernel. 

Definition 1. A real-valued continuous function k : R d x R d ->• R is called positive definite on E d if 

n n 

^2^2a iaj k{z^\z^)>0, (15) 
for n distinct points {z^\ . . . , z^} C R d , and a — [ax, . . . , a n ] T € M". // in addition, the quadratic form 



(15) is equal to zero if and only if a = 0, then k is called strictly positive definite on R d 191. 

Among others, a commonly used strictly positive definite kernel is the Gaussian [S]. We note that this 



property of the Gaussian implies unique solvability of (13 1. 

We now consider the second question: what types of functions can we approximate to arbitrary precision? 
As we might expect, an RBF interpolant will converge to functions contained in the closure of the span of 
all translates of the kernel k, known as the native space: 
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Definition 2. Given a strictly positive definite reproducing kernel k : Q x S7 — > R on a Hilbert space, 
the native space 7Vfc(f2) is defined as the completion of the space of linear combinations of the kernel: 
"Hfc(ri) = span{k(-,z) : z G Q}. This completion is with respect to the k-norm, which is induced by the 
inner-product given by the reproducing kernel k on the pre-Hilbert space J^j. 

For 17 — R d , the native space for the Gaussian RBF is 



where / and k are the Fourier transforms of / and k, respectively [9|. We observe that the native space for 
Gaussians is not large, and is restricted to functions whose Fourier transform decays at least as fast as the 
Gaussian 

To answer the third question, we must be able to consistently quantify the density of interpolation nodes. 
The commonly used measure is the fill distance, the maximum distance from an interpolation node: 

Definition 3. For the domain C R d and a set of interpolation nodes Z = {z^\ . . . ,z^} C fl the fill 
distance, hz,n, is defined by 



Theorem 15.1 in [S] establishes exponential L°° convergence of a Gaussian RBF interpolant to functions in 
the native space (with respect to decreasing fill distance). 

With a basic understanding of the convergence properties of Gaussian RBF interpolants, we may now 
consider whether the components of <I> 1 can be approximated to arbitrary precision. Despite the regularity 
of <1? 1 over the domain SI = <f>(A4), it appears that there may not be a consistent extension of the function 
to all of K d such that its components are members of the (very restricted) native space of the Gaussian. For 
example, any extension to a compactly supported function will exhibit a slowly decaying Fourier transform 
relative to the Gaussian. Any extension with Gaussian decay is the sum of a Gaussian and a compactly 
supported function, again with slowly decaying Fourier transform. As a result, we are likely restricted to 
approximate approximation when using the Gaussian RBF. However, these issues are ultimately irrelevant, 
as numerical issues will prevent convergence of Gaussian RBF interpolants, even within the native space. 

5. Ill-Conditioning 

In this section we discuss ill-conditioning of Gaussian kernel matrices. The condition number is closely 
related to the choice of scale parameter e. We use the scale parameter convention common to the RBF 
community: k(z,w) = exp(— e 2 \\z — w\\ 2 ). This scale parameter corresponds to the inverse of a, the 
commonly used scale parameter within the machine learning community. With the RBF convention, a small 




(16) 



hzn '■= sup min \\z — z^'\ 



(17) 
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scale parameter corresponds to wide and flat basis functions, while a larger scale parameter corresponds to 
narrow and localized Gaussians. 

In the previous section, it was observed that a Gaussian RBF interpolant will converge exponentially to 
a function in the native space with respect to fill distance. However, the error bounds are theoretical and 
do not consider interpolation error resulting from numerical ill-conditioning. It is a well known that if the 
same scale parameter is used as fill distance is reduced, a Gaussian kernel matrix W will rapidly become 
ill-conditioned, and the resulting interpolant will exhibit numerical saturation error. This issue is common 
among many RBF interpolants. As shown in [13], the eigenvalues of W follow patterns in the powers of 
e that increase with successive eigenvalues, which leads to rapid ill-conditioning of W with increasing n. 
These patterns depend on a number of factors including dimension as well as the geometry of the nodes. 
The nature of this behavior is not the focus of this paper, and the interested reader is referred to [T3J for 
additional details. 

In 1 14] . the authors show that the numerical rank (the number of eigenvalues larger than a certain 
threshold) of a Gaussian kernel matrix is independent of the number of data points inside a box in M. D , but 
instead only depends on the scale parameter. As a result, as additional points are added, the additional 
eigenvalues added are small and rapidly increase the condition number of the matrix. 

The relationship between the condition number of W and the spacing of interpolation nodes is explored 
in figure [2] Owing to the difficulty in precisely establishing the boundary of the domain il C R d given a 
discrete set of randomly sampled data, we note that estimating the fill distance hz,n is somewhat difficult. 
Additionally, the fill distance is a measure of the "worst case", and may not be representative of the "typical" 
spacing between nodes. Thus, we consider a proxy for fill distance which depends only on mutual distances 
between the data points. We use the term local fill distance, hi ooa i, to denote the average distance to a 
nearest neighbor: 



The local fill distance is less sensitive to slightly irregular node spacing resulting from random sampling in 
the domain, and thus more representative of the "typical" spacing of interpolation nodes. Nonetheless, given 
a regular sampling scheme, the local fill distance should provide a good proxy for fill distance. We note that 
several authors have recently proposed a similar notion of local fill distance which can be used to derive 
more accurate pointwise error estimates |15l 116] . In figure [2] we observe rapid ill-conditioning of the weight 
matrix with respect to decreasing local fill distance. 

Conversely, if the fill distance remains constant while the Gaussian kernel scale parameter is reduced, 
the resulting interpolant improves until ill-conditioning of the W matrix leads to propagation of numerical 
errors. Figure [3] demonstrates the rapid increase in condition number of W with decreasing values of e. 
When interpolating with the Gaussian kernel, choice of the scale parameter e is difficult. On one hand, 




(18) 
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Figure 2: Condition number of the Gaussian kernel matrix W for points randomly scattered on the first quadrant of the unit 
sphere in various dimensions with scale parameter e = 10~ 2 , for varying local fill distance jl8| . Note: the same range of n was 
used in each dimension (10 to 1000). In high dimension, it takes a large number of points to reduce fill distance. However, the 
condition number of W still grows rapidly for increasing n. 

smaller values of e likely lead to a better interpolant. For example: in 1-d, a Gaussian RBF interpolant 
will converge to the Lagrange interpolating polynomial in the limit as e — > |17| . On the other hand, the 
interpolation matrix becomes rapidly ill-conditioned for decreasing e. Figure [4] shows the interpolation error 
versus e for a function in 1-d. 

One solution is to generate the RBF interpolant using a stable algorithm. The first such algorithm was 
the Contour-Pade algorithm |18j . This algorithm relies on contour integration in the complex-e plane to 
avoid the ill-conditioned region, while calculating the interpolant in the e — > limit. The RBF-QR |19l 
ISU] and RBF-GA [5T] algorithms rely on generating a better basis for the same space spanned by the 
Gaussian RBFs, which allows stable evaluation of the interpolant for small e. All of the stable algorithms 
are more computationally intensive and algorithmically complex than the RBF-Direct method, making them 
undesirable for the inverse-mapping interpolation task. In the following section we propose a method that 
avoids the numerical issues associated with Gaussian RBF interpolation by using a numerically stable, 
scale-free interpolation kernel. 

6. Cubic RBF Interpolation 

Saturation error can be avoided by using the scale-free RBF kernel g(z,w) = \\z — w\\ 3 , one instance 
from the set of RBF kernels known as the radial powers: 

g(z, w) = \\z- w\\? for ,3 = 1,3,5,.... (19) 
Together with the thin plate splines: 

g{z,w) = \\z-w\\P\og\\z-w\\ for = 2,4,6,..., (20) 
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Figure 3: Condition number of the Gaussian kernel matrix W for n = 200 points randomly scattered on the first quadrant of 
the unit sphere in various dimensions, for varying e. 

they form the family of RBFs known as the polyharmonic splines. 

For RBF interpolation, the cubic kernel is less intuitive than the Gaussian, as it is a monotonically 
increasing function. To motivate the use of this kernel, we begin by considering the one dimensional case, 
in which the cubic RBF generates a cubic spline interpolant. 

6.1. Cubic RBF and Cubic Splines 

For interpolations nodes {z^\ . . . , z^™'} C M, the cubic RBF interpolant is of the form 

n 

s(z) =^a W |z-z W | 3 . ( 21 ) 

i—\ 

Each term in the sum is piecewise cubic, with continuous first and second derivatives. Thus, s(z) inherits 
the same properties, and is observed to be a cubic spline. The cubic RBF kernel has a discontinuous 
third derivative at the origin, which translates into discontinuities in ds 3 /dz 3 at the interpolation nodes 
(corresponding to the typical formulation of a cubic spline). In general, the requirements of a cubic spline 
leaves two additional degrees of freedom that may be chosen to provide additional regularity. 

The cubic RBF interpolant automatically makes arbitrary endpoint choices for the additional degrees 
of freedom. We briefly investigate the endpoint behavior. See for a more detailed treatment. For 
simplicity, we consider interpolation on the interval [—1, 1]: —1 = < z^> < . . . < = 1. All terms in 
the cubic RBF interpolant change sign between z^ and z^™': 

, V™ . a«(z-z«) 3 for z> 1 
s(z) = { ^ t=1 V ^ " (22) 

ELi- aW 0- z (l) ) 3 for z<-l. 
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Figure 4: Maximum error versus e (left) for a Gaussian RBF interpolant, and example interpolants for several representative 
values of e (right). The function f(x) = 1/(1 + 16x 2 ) is interpolated using the Chebyshev nodes on the interval [—1,1] to avoid 
the Runge phenomenon. For large e, the basis function become too localized, and provide a poor interpolant. For small e, 
ill-conditioning of the interpolation matrix destroys the interpolant. 

A system of equations can be set up and solved to show the following cndpoint conditions on the second 
derivative |22j : 

S "(l)= 2S '(1)- S '(-1)-!( S (1) + *(-!)) 
s»(-l)= s '(l)-2 S '(-l)-f( S (l) + S (-l)). 
One common set of additional conditions are those of the natural spline, which enforce the conditions 
s"(l) = s"(— 1) = 0. These conditions are achieved by adding constant and linear polynomial terms to the 
cubic RBF interpolant, 

n 

S (z)=f3 + f3 1 z + Y / ^ ) \z-z^\\ (24) 
subject to the additional constraints Y17=t a ^ ~ 

=1 aWz« = 0. Then, for z > 1, 

n 

s(z) =/3o + /3 lZ + ^aW(z-z«) 3 , (25) 

i=l 

and we find that 

n n 

s"(z) =6zJ2 a{€> ~ 6 a(i)z(i) = °- ( 26 ) 

i=l i=l 

By an identical argument it is shown that s"(z) = for z < — 1. Figures [5] and ^demonstrate the improved 
behavior of the cubic RBF interpolants near boundaries when constant and linear polynomial terms are 
included. 
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Figure 5: Cubic RBF interpolants with additional constant and linear terms (left) and corresponding error (right) for the test 
function f(z) = 1 + sin(27rz) + cos(2z). 
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Figure 6: Cubic RBF interpolants with additional constant and linear terms on the test function f(z, w) = 1+0 2z?+o 3w 2 
Color plots represent absolute interpolation error: (white) to 7 X KT 4 (black). 

6.2. Multivariate Cubic RBF Interpolation 

As we consider multivariate interpolation using the cubic RBF, we must again address the following three 
questions: 1) Given a set of interpolation nodes, is the interpolation matrix non-singular? 2) What types of 
functions can be approximated? 3) What convergence rate can we expect as we populate the domain with 
additional nodes? 

In order to address the question of solvability, we must introduce some additional definitions. We begin 
with the definition of a conditionally positive definite kernel, a generalization of positive definite. 

Definition 4. A real-valued continuous function k : K d x M. d — > K is called conditionally positive definite 
of order m on M d if 

n n 

^^ a ^-fc(^),^))>0, (27) 
i=\ j=i 
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for n distinct points {z^\ . . . , z* n )} C M, d , and a. — [ai, . . . , a n ] T G M n , subject to 

n 

J2^p(z {i) ) = 0, (28) 



where p(z) is any real-valued polynomial of degree at most m — 1. 7/ in addition, the quadratic form (21) is 
equal to zero if and only if a = 0, then k is called strictly conditionally positive definite on M. d 

The cubic RBF is conditionally positive definite of order 2 on R d [S]. The attentive reader may note 
that the augmentation of the cubic RBF basis with the constant and linear polynomials will help to provide 
unique solvability of the interpolation system, given this property of the cubic kernel. Unique solvability 
will also require a mild condition on the node locations. For this, we define the concept of m-unisolvency: 

Definition 5. The set of nodes {z^\ . . . ,z^ n )} C R d is called m-unisolvent if the unique polynomial of 
total degree at most m interpolating zero data on {z {1 \ . . . , z^} is the zero polynomial. 

Given the properties of the cubic RBF kernel, we require that {z^\ . . . , z^} form a 1-unisolvent set in 
M. d . For a simple example from jS], 3 collinear points in M 2 do not form a 1-unisolvent set, because different 
rotations of the zero-plane can interpolate zero data over these nodes. However, 3 non-collinear points in M 2 
form a 1-unisolvent set. The requirement of a 1-unisolvent set in R d is equivalent to the following condition: 



span{(«W - for i, j = 1, . . . , n} = R d . (29) 

With the definition of m-unisolvent, and the property of strictly conditionally positive definite, we may now 
employ the following theorem to guarantee non-singularity of the interpolation matrix: 

Theorem 1 (7.2 from |9j). If the real-valued even function 7 : R d -> K is strictly conditionally positive 
definite of order m and the points {z^ x \ . . . , z^} form an (m — \)-unisolvent set, then the following system 
of linear equations is uniquely solvable: 



G 


P 




a 




/ 


P T 















(30) 



where G VJ = T (||z« - for i,j = l,...,n;f= [/(z«), . . . , f(z^)} T ; P kl = Pl (z^) fork = l,...,n 

and I — 1,...,M, and the polynomials pi{z) for I = 1, ...,M form a basis for the linear space of all 
polynomials up to degree (m — 1). 

The proof of the theorem demonstrates the need for the technical conditions on the interpolation nodes 
as well as the augmentation of the cubic RBF basis with the constant and linear polynomials to guarantee 
a unique interpolant. 
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Proof 1. Following the proof in jS], we assume [a,(3] T is in the null space of the interpolation matrix in 
(30 1, i.e. we let / = 0. If we consider the top block, and left multiply by a T we have 



<x T Gcx + a T P(3 = 0. (31) 



We observe that ct T P = T , since P T a = from the lower block of (30 1. This leaves 

a T Ga = 0. (32) 
We observe that the quadratic form a T Ga = if and only if a — 0, since 7 is strictly conditionally positive 



definite of order to. Revisiting the top block of (30 1, we now see that 

P/3 = 0. (33) 

The (to — l)-unisolvency of {z^\ . . . , z^} enforces that (3 = 0. □ 

In our case (m = 2), we use the constant and linear polynomials: p\{z) = 1 and pi(z) — zi_i for 
I = 2, ...,d+ 1 (zi denotes the l-th coordinate of z in R d ). Then, given that {zW, . . . , *(")} for ms a 
1-unisolvent set of interpolation nodes in M. d , we are guaranteed the existence of a unique interpolant 
s : R d -> R of the form: 

n d 

s(z) = « W 11^ - * W II 3 + /3o + E /W ( 34 ) 
i=i j=i 

We must now concern ourselves with the types of target functions to which a cubic RBF interpolant will 
converge. In particular, we hope to approximate the components of 3> -1 with arbitrary precision. In order 
to characterize the native space of the cubic RBF, we begin with the definition of the Beppo Levi space: 

Definition 6. For I > d/2, the linear space 

BU{R d ) := {/ e C(R d ) : D a f e L 2 (R d ),\/\a\ = I}, (35) 

equipped with the inner product 

(f,9) BLl m = E ^( Da f>D a 9) L2 (n*), (36) 

is called the Beppo Levi space on R d of order I, where D a denotes the weak derivative of (multi-index) order 
a E N d on R d . 

When the dimension, d, is odd, the native space of the cubic RBF is the Beppo Levi space on R d of order 
/ = [II]- For even dimension, the Beppo Levi space on R d of order I = ^±2 corresponds to the native 
space of the thin plate spline g(z, w) = \\z — w\\ 2 \og\\z — w\\ [TT|. The details of the proof are technical, and 
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the interested reader is referred to Theorem 10.43 in [TT|. We lack a characterization of the native space for 
the cubic RBF in even dimension. 

At this point it is natural to briefly discuss the optimality of polyharmonic spline interpolants. It is a 
well-known result that the natural cubic spline minimizes the integral of the squared second derivative over 
the space of interpolants in one-dimension. Likewise, in two-dimensions the thin-plate spline interpolant to 
scattered data minimizes the so called "bending energy" (hence the term thin-plate spline, or surface spline). 
These optimality results were generalized to any dimension and order of the iterated Laplacian in [23 . An 
excellent summary of the results of the variational approach can be found in |24| . The generalized results 
for the iterated Laplacian have some interesting and initially surprising implications. For example, in three- 
dimensions the optimal interpolant in terms of minimizing the second-order (generalized) derivatives is the 
linear radial function |25| . In higher dimension, the family of polyharmonic splines no longer provides an 
optimal interpolant for minimizing the analog to the two-dimensional bending energy. Instead, appropriate 
choices from the family of polyharmonic splines (thin plate splines in even dimension and radial powers in 
odd dimension) are optimal relative to higher orders of the iterated Laplacian. An important observation 
is that the physical interpretation of bending energy and thus the motivation for optimality with respect to 
the second order derivatives disappears beyond two-dimensions. 

All of our numerical experiments have demonstrated equal or better performance of the cubic RBF 
relative to the thin plate spline regardless of whether we work in even or odd dimension. Similar results 
have been observed in [26 . Thus, to promote algorithmic simplicity for practical applications, we have 
chosen to work solely with the cubic RBF. 

For completeness, we will show that the functions we wish to approximate are members of the native 
space of the cubic RBF provided that we work in odd dimension. We first observe that, C°°(IR d ) |~| C c (M. d ) C 
BLi(M. d ), for any I, where C c (R d ) is the linear space of compactly supported functions on M. d . The compo- 
nents of 0^7 1 : $(M) -> R for k = 1, . . . , D can be extended to functions in C°°{R d ) f| C c (R d ). Thus, 
the components of 3> _1 are members of the native space of the cubic RBF, provided that d is odd. 

The remaining question is the rate of convergence for a cubic RBF interpolant. Theorem 11.16 in |11| 
establishes algebraic convergence of at least 0(}i 3 Jq) of the cubic RBF interpolant to functions in the native 
space (we again leave out the technical details here). In practice, we have experienced much higher rates 
of algebraic convergence, as will be seen in the experimental section. Theorems exist that improve upon 
the order 3/2 bound for convergence of the cubic RBF, given additional conditions on the smoothness and 
boundary conditions of the target function, for example in \27\ 128) 129] . Any order of algebraic convergence 
is still slower than the exponential convergence of the Gaussian RBF. Nonetheless, our opinion is that in 
many applications the benefit gained by the simplicity and numerical stability of the scale-free cubic RBF 
kernel will offset the reduced convergence rate relative to the Gaussian. 
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6.3. Far Field Behavior of the Cubic RBF Interpolant 

Understanding the behavior of the cubic RBF interpolant near boundaries of the data is more challenging 
in the multivariate case. Nonetheless, we will observe that inclusion of constant and linear polynomial terms 
in the interpolant will improve the far field behavior of the interpolant, in addition to providing for unique 
solvability. To begin the analysis, we translate to spherical coordinates in M. d (generalizing the argument 
of [22] from 2 to (/-dimensions) : 



(37) 



z\ = r cos fi = ryi 

Z2 = r sin 9\ cos 02 = rj2 

z d = r sin 6»i . . . sin0 d _i cos6> d = r-/ d . 
We proceed to consider the interpolant 



s(z) = 5> W H*-* W II 3 + A) + X>^ (38) 
i=i j=i 

n d 

= £ a W((ft - z[ l) ) 2 + ... + (z d - zfffl* + p + £ Mi (39) 

i=l 3=1 
n d 

= J2 « w dni - 4 l) ) 2 + ■■■ + (n d zffft* + (3 + rJ2 Mi (40) 

i=l 3=1 
n 

= £ a«(r 2 ( 7l 2 + • • ■ + t3) - 2r(71*l° + • • • + + ((*i ) a + ■ ■ ■ + (*S°) 2 )) >/a + 

i=l 

fa+r^TMi ( 41 ) 

= r 3 £ a („ (1 _ i 2(7iZ W + + ^ + 1 ((je C0 )3 + . . . + (z W )2)) 3/2 + ^ + r J- ^. 7 .. (42) 
i=l r r 3=1 

If we Taylor expand the first sum around - = (r = oo), we find 



«(*) = r 3 jl> W }+r 2 |-3E^E« ( V' ) }+^{---} + {---} + ^{---} + --' (43) 
= r{... } + {... } + } + •-., (44) 

where the first two terms are zero as we enforced the constraints Y^i=i a ^ = ^27=1 a ^ = ^ or ' = 
1, . . . , d. Thus, we observe that for large r, the cubic RBF with inclusion of constant and linear polynomial 
terms, behaves linearly in the radial direction. Alternatively, we observe that d 2 s/dr 2 = 0(l/r 3 ) for large 
r. By only adding up to linear polynomials, we maintain algorithmic simplicity, and regularize the far field 
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behavior of the interpolant. The practical consequence of this observation is that we should expect at most 
linear divergence of the interpolant if we extrapolate outside the convex hull of the data. In particular, we 
should not experience divergence with r 3 as we might naturally expect based on the form of the interpolant. 



7. Experiments 

Convergence of RBF interpolants is typically addressed in terms of the L°°-norm as a function of fill 
distance. Both of these are measures of the "worst-case" scenarios, and may not represent typical performance 
over the domain. Additionally, estimation of both of these quantities is difficult, particularly near the 
boundaries of a discretely sampled domain. For these reasons, we have chosen to measure node spacing 
and error in a more natural manner. In the first two examples, we synthetically sample a manifold. In 
order to assess relative performance of the inverse mapping algorithms, we measure error as a function of 



local fill distance, hi oca i, from (18 1. Local fill distance is chosen to provide an appropriate measurement of 
node spacing under randomized sampling schemes. The relationship between local approximation error and 
variable local node density has been investigated by |15| \16\ . Although we do not address variable node 
density in the present work, we note that the notion of local fill distance is easily generalized to accommodate 
this issue which is likely to arise in many applications. 

Similarly, error will be quantified with average I 2 error. Given a dataset {x' 1 ), . . . , x^ n '} C K D with a 
corresponding low-dimensional representation {y^\ . . . , y^ n '} C R d , some random subset {a;" 1 -', . . . , x^ m ^} 
will be selected for testing of the inverse mapping algorithms. For each of these points, = <& _1 (y' Ji )), 
we will compute the approximate inverse ^(y^*'). The average I 2 error, E avg , is calculated as 

E aog = - 1 £\\*M-*HvM)l (45) 

i—l 

We can think of the average I 2 error as an approximation of the L 1 -norm of the Euclidean error over the 
domain. This measure will appropriately assess the typical performance of the algorithms over the entire 
dataset. 

The following examples demonstrate the rapid convergence of the cubic RBF inverse mapping method. 
We observe convergence rates substantially faster than the G(1i^q) bound [TT|. We expect increased orders 
of algebraic convergence provided additional regularity of the target function beyond the minimum require- 
ments of the native space (see theorems in [27, 28, 29J). We begin with a very simple example and proceed 
to evaluate the performance of the algorithm on problems with increasing complexity. 

7.1. Unit Circle 

In this example, n points {a;*- 1 -*, . . . , x^} are evenly distributed on the unit sphere S 1 (the unit circle in 
M 2 = M. D ). Each point is separated from its two nearest neighbors by a distance of exactly 2sin(-^). One 
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Figure 7: Performance of cubic RBF (left), Gaussian RBF (center), and Shepard's method [6] (right) inverse mapping algorithm 
on unit circle in M 2 . Node spacing is measured by local fill distance, hi oca l jl8| l. Error is average I 2 error, i^avg (45) , Note 
difference in range of j/-axis. 

additional point x is placed on the unit circle half-way between a^ 1 ) and x^ 2 \ The data are mapped to 
{t/ 1 ), . . . , y( n \y} C R d = K 2 using the first two non-trivial eigenvectors of the graph Laplacian. The inverse 
mapping algorithms attempt to accurately re-generate the point x £ M. D from y € R d , by interpolation of 
the data {x^\ . . . , x^} C R D as a function of their coordinates {y^\ j/™- 1 } C R d . 

The performance of the numerical inverse mapping methods versus local fill distance is shown in figure [7] 
The convergence of cubic RBF inverse mapping is rapid, and appears to scale approximately with O(hf ocal ). 
Interpolation error decreases with fill distance to approximately 10 -15 when machine precision prevents 
further improvement in the interpolation. The algorithm captures curvature extremely well, even with 
sparse data: with only 10 interpolation nodes on the unit circle (the largest fill distance shown in figure |7| 
the interpolation error is approximately 10 -3 . 

For comparison, the same results are shown also in figure [7] for the Gaussian RBF inverse mapping al- 
gorithm as well as Shepard's method |6j. The Gaussian RBF interpolant initially shows rapid convergence, 
down to the range of 10~ 6 to 10 -9 depending on choice of e, before ill-conditioning of the interpolation 
matrix leads to propagation of errors. Shepard's method [6] converges more slowly, and appears to scale 
approximately with 0(hf ocal ). In all cases, the inverse mapping method was implemented locally, with 
the number of nearest neighbors used in the mapping set to either 40 or the total number of data points, 
whichever is fewer. Shepard's method [S] shows clear evidence of the transition from a global to a local im- 
plementation of the algorithm (only demonstrating convergence following the transition to the local regime) . 
The convergence of the cubic RBF algorithm shows no such evidence, suggesting that the local and global 
implementations are equally effective. 
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Figure 8: Performance of cubic RBF (left), Gaussian RBF (center), and Shepard's method _6j (right) inverse mapping algorithm 
on S 4 embedded in R 10 . Node spacing is measured by local fill distance, hi oca i flSp . Error is average I 2 error, &avg ( ]45| , Note 
difference in range of y-axis. 

7.2. Unit Sphere in R D 

In this example, n points {a^ 1 * 1 , . . . , x^} are randomly distributed on the unit sphere S* 4 , then em- 
bedded in R 10 via a random unitary transformation. The data are mapped to {j/ 1 -* , . . . , j/™-*} C R rf = 
R 5 using the first five non-trivial eigenvectors of the graph Laplacian. The inverse mapping algorithms 
attempt to accurately re-generate one point x^ 6 R D from j/w) g K d ; by interpolation of the data 
{a;W,.. .,x^~ 1 \x^ +1 \.. . ,x^} C R D as a function of their coordinates {y^, . . .,y^~ 1 \y^ +1 \ . . .,y {n) } C 
R d . The performance of the cubic RBF, Gaussian RBF, and Shepard's method [6J versus local fill distance 
is shown in figure [8] 

The minimum of the total number of neighbors, and 100 neighbors was used to provide better local 
coverage in higher dimension. Also, at each fill level, the inverse mapping was repeated on a random subset 
of the nodes and the errors were averaged to provide a better estimate of average inverse mapping error. 
In figure [8] we observe that the convergence of cubic inverse mapping is the fastest, and appears to scale 
approximately with O(hf ocal ). Shepard's method [5] again only convergences in the local regime. 

7.3. Handwritten Zeros Dataset 

In addition to the previous "artificial" test examples, performance of the inverse mapping algorithm was 
assessed on a "naturally occurring" high-dimensional data set: a set of digital images of handwritten zeros. 
The data set consisting of 1000 handwritten zeros was derived from the MNIST database [3D]. The images 
were centered and size-normalized in a fixed size image. Each image in the database is a 28 x 28 gray scale 
image. These images were resized to 14 x 14 pixels, reshaped into a vector, and normalized to have Euclidean 
norm 1, providing a dataset of 1000 points in R 196 . 
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A 10-dimensional representation of the 1000 handwritten zeros was generated via Laplacian Eigenmaps. 
Then the inverse mapping techniques were tested on all images in the set, and compared to the original. 
Figure [9] shows a representative reconstruction via the various inverse mapping techniques. Figure [TO] shows 
the histogram of errors for the three methods. Shepard's method [6j performs the most poorly, typically 
generating a very blurry reconstruction of the image, as a simple linear combination of its neighbors. The 
Gaussian RBF method reduces reconstruction error relative to Shepard's method, but tends to inaccurately 
reproduce certain pixel values. The cubic RBF performs the best, producing a crisp reconstruction of the 
image, with the lowest reconstruction error. 



8. Discussion 

In the previous section we demonstrated excellent performance of a numerical inverse mapping technique 
involving interpolation using the cubic RBF. We now provide a novel interpretation of the Nystrom Method, 
a scheme commonly used to interpolate the eigenvectors of the weight matrix W . As we will see, the Nystrom 
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Figure 10: Histogram of reconstruction error for all inverse mapping algorithms on the handwritten zeros dataset. Error is I 2 , 
treating images as vectors in M 196 . 

Method directly generates an RBF intcrpolant of the eigenvectors of the weight matrix. This interpretation 
provides new insight into the limitations and potential pitfalls of the Nystrom Method. 

8.1. Revisiting Nystrom 

The Nystrom method was developed as a technique for numerical approximation of integral eigcnfunction 
problems of the form [8] 

6 

k(x,y)<f)i(y)dy = \i<j)i(x). (46) 

If we sample the function at a set of evenly-spaced sample points x^ for i = 1, . . . , n, we can approximate 
the integral with a simple quadrature rule, 

h —^k{x^\x^)Ux U) ) = XM^)- (47) 
n * — ' 

i=l 

This is equivalent to the matrix eigenvalue problem (absorbing the constant into A;), 

Wfr = A z <fr, (48) 

where Wij = i(zW,iW) and cj>i and A; are the eigenvectors and corresponding eigenvalues of W for 
i = 1, . . . ,n. Here we generalize the problem to cc^-* € K D . For A = diag(Ai, . . . , A„) and $ = [<pi, . . . , 4> n ] £ 
M. nxn , we consider the full eigenvector decomposition of the weight matrix W: 

W = $A$ T . (49) 

In the Nystrom context, we again define the mapping 4n : R D — > R as follows: <pi{x^) = cf>u, where 4>u is 
the i, I entry of the eigenvector matrix $ 6 M. nxn . 
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Provided that A; 7^ 0, the Nystrom Extension provides a technique to extend the eigenvector </>;, defined 
over a set of sample points, to an arbitrary new point x as |31| 

n 

! i=i 

Perhaps the most common choice for the kernel function k is the Gaussian: k(x^ l \x^') = cxp(— e 2 ||a;( l ) — 
a;^^|| 2 ). In this case, the affinity matrix W is guaranteed non-singular under only the simple condition of 
non-coincident nodes in R"° [5] (thus, A; 7^ for I = 1, . . . ,n). Additionally, since W is a real symmetric 
matrix, when normalized, {</>;}" = i form an orthonormal basis for R" (i.e. $ T $ = I n , the identity matrix on 
R"). 



We now proceed by re-writing 4>i{x) from (50), using the notation k(x, •) = [k{x, x^), . . . , k(x, x^)] T 



k(x) = \J 1 k{ X ,-) T <l> l 



T 



= k(x,-) T $[0...\- 1 ...0] 
= fc(a V ) T *A- 1 [0...1...0] T (51) 
= fc(;r,-) T *A- 1 * T i 
= k{x,-) T W- l 4n- 

If we compare the last line to (14), we observe that in the case of a Gaussian kernel matrix W, the Nystrom 
extension generates a radial basis function interpolation of the eigenvectors of W. This interpretation 
provides insight into some potential pitfalls of the Nystrom Method. 

The first important observation about the Nystrom interpolation scheme, is the sensitivity to the scale 
parameter in the case of the typical Gaussian kernel weight matrix. If the scale parameter e is too large, 
the basis functions will be too localized and provide a poor interpolant. On the other hand, if the scale 
parameter is too small, the weight matrix W will be very poorly conditioned, and numerical errors may 
distort the interpolant. A great deal of research has focused on methods to select an appropriate scale 
parameter (e.g. |141 l3"2] l. 

The second observation involves the dangers of sparsifying the weight matrix. In many nonlinear di- 
mensionality reduction applications, it is typical to sparsify the kernel matrix by setting many entries in 
the weight matrix to zero. Two strategies are typical: thresholding the matrix, and the /c-nearest neighbor 
approach. Thresholding is carried out as follows, 

expt-e 2 !!^)-^)^) if \\xV-xWWK8, 
Wh = { _ (52) 

otherwise. 

The fc-nearest neighbor approach is as follows, 
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Figure 11: Example RBF interpolation using a truncated Gaussian as basis function. 



, exp(-e 2 ||a;W - x ti)\\ 2 ) if x Ci) g j\T (i) 
W i:j = { (53) 
otherwise, 

where N x (i) denotes the set of fc-nearest neighbors of 

If the Nystrom extension is applied to a thresholded Gaussian kernel matrix, then the result is an RBF 
interpolation of the eigenvectors of the submatrix, where the basis function are truncated Gaussians. This 
is clearly problematic, and will likely generate a very poor (discontinuous) interpolant, as demonstrated in 
figure 1 1 1 1 In the fc-nearest neighbor approach, the Nystrom method cannot be interpreted as a consistent 
RBF interpolant, as the truncation distance varies from node to node. As a result, this method is also likely 
to perform poorly. 

A better alternative to Nystrom would simply involve a more intelligent interpolation scheme to extend 
the eigenvectors to new points in the space. The eigenvectors may be calculated using a (possibly truncated) 
Gaussian weight matrix over a subset of the nodes. However, these eigenvectors should be extended using a 
consistent interpolation scheme such as a true (non-truncated) Gaussian RBF, or a cubic RBF interpolant 
which provides better results as indicated in this paper. Local instead of global implementation of the 
interpolation algorithm may provide significant computational savings in certain scenarios. 



9. Summary and Further Work 

A numerical method is proposed to approximate the inverse of a general bi-Lipschitz nonlinear dimen- 
sionality reduction mapping, where the forward and consequently the inverse mappings are only explicitly 
defined on a discrete dataset. An RBF interpolant is used to independently interpolate each component of 
the high-dimensional representation of the data as a function of its low-dimensional representation. The 
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scale-free cubic RBF kernel is shown to perform better than the Gaussian kernel, as it does not require 
the difficult-to-choose scale parameter as an input, and does not suffer from ill-conditioning. Inclusion of 
additional constant and linear polynomial terms in the cubic RBF basis improves behavior of the interpolant 
near boundaries, and guarantees non-singularity of the interpolation matrix. 

Following exploration of the RBF inverse interpolation scheme, an interpretation of Nystrom as an RBF 
interpolant suggests that this method should not be directly used as an extension scheme when affinities 
are measured using a truncated Gaussian. Such a scheme will generate a poor (discontinuous) eigenvector 
interpolation. The present results suggest that reliability of the Nystrom method could be improved by 
using cubic RBF interpolation of eigenvectors instead of the current method which generates the RBF 
interpolation using the kernel used to build the weight matrix (typically the Gaussian) . 

The present algorithm generates an exact interpolant to the data. Performance in applications with 
noisy data will be improved by an approximate interpolation technique. This can be achieved in various 
ways: for example by using fewer RBF kernel locations than data points, and finding the "best" solution 
(least squares for example), or by a regularized approach which puts a penalty on the fitting coefficients 
(e.g. Kernel Ridge Regression). Application of such strategies will be the subject of future work. 

Acknowledgments 

NDM was supported by National Science Foundation grant DMS 0941476; FGM was partially supported by 
National Science Foundation grant DMS 0941476, and Department of Energy award DE-SCOO04096. 

References 

[1] B. Scholkopf, A. Smola, K.-R. Miiller, Kernel principal component analysis, in: Advances in Kernel Methods-Support 

Vector Learning, MIT Press, 1999, pp. 327-352. 
[2] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Computation 15 

(2003) 1373-1396. 

[3] S. Roweis, L. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (2000) 2323—2326. 

[4] A. Elgammal, C.-S. Lee, Nonlinear manifold learning for dynamic shape and dynamic appearance, Computer Vision and 

Image Understanding 106 (2007) 31-46. 
[5] A. Elgammal, C.-S. Lee, The role of manifold learning in human motion analysis, in: Human Motion, Springer, 2008, pp. 

25-56. 

[6] D. Kushnir, A. Haddad, R. Coifman, Anisotropic diffusion on sub-manifolds with application to earth structure classifi- 
cation, Applied and Computational Harmonic Analysis 32 (2012) 280-294. 

[7] M. Belkin, P. Niyogi, Towards a theoretical foundation for Laplacian-based manifold methods, Journal of Computer and 
System Sciences 74 (2008) 1289-1308. 

[8] E. Nystrom, Uber die praktische Aufiosung von linearen Integralgleichungen mit Anwendungen auf Randwertaufgaben 
der Potentialtheorie, Commentationes Physico-Mathematicae 14 (1928) 1—52. 

[9] G. Fasshauer, Meshfree Approximation Methods With MATLAB, Interdisciplinary Mathematical Sciences, World Scien- 
tific, 2007. 

25 



[10] M. Buhmann, Radial basis functions, Acta Numerica 9 (2000) 1-38. 

[11] H. Wendland, Scattered Data Approximation, Cambridge Monographs on Applied and Computational Mathematics, 

Cambridge University Press, 2004. 
[12] M. Powell, Radial basis function methods for interpolation to functions of many variables, in: HERCMA, 2001, pp. 2-24. 
[13] B. Fornberg, J. Zuev, The Runge phenomenon and spatially variable shape parameters in RBF interpolation, Comput. 

Math. Appl. 54 (2007) 379-398. 
[14] A. Bermanis, A. Averbuch, R. Coifman, Multiscale data sampling and function extension, Applied and Computational 

Harmonic Analysis 34 (2013) 15-29. 
[15] R. Devore, A. Ron, Approximation using scattered shifts of a multivariate function, Transactions of the American 

Mathematical Society 362 (2010) 6205-6229. 
[16] T. Hangelbroek, On local RBF approximation, Advances in Computational Mathematics 37 (2012) 285-299. 
[17] T. Driscoll, B. Fornberg, Interpolation in the limit of increasingly fiat radial basis functions, Comput. Math. Appl. 43 

(2002) 413-422. 

[18] B. Fornberg, G. Wright, Stable computation of multiquadric interpolants for all values of the shape parameter, Comput. 

Math. Appl. 48 (2004) 853-867. 
[19] B. Fornberg, C. Piret, A stable algorithm for flat radial basis functions on a sphere, SIAM J. Sci. Comput. 30 (2007) 

60-80. 

[20] B. Fornberg, E. Larsson, N. Flyer, Stable computations with Gaussian radial basis functions, SIAM Journal on Scientific 
Computing 33 (2011) 869-892. 

[21] B. Fornberg, E. Lehto, C. Powell, Stable calculation of Gaussian-based RBF-FD stencils, Comput. Math. Appl. 65 (2013) 
627-637. 

[22] B. Fornberg, T. Driscoll, G. Wright, R. Charles, Observations on the behavior of radial basis function approximations 

near boundaries, Comput. Math. Appl. 43 (2002) 473-490. 
[23] J. Duchon, Splines minimizing rotation-invariant semi-norms in Sobolev spaces, in: W. Schempp, K. Zeller (Eds.), 

Constructive Theory of Functions of Several Variables, volume 571 of Lecture Notes in Mathematics, Springer Berlin / 

Heidelberg, 1977, pp. 85-100. 

[24] G. Wahba, Spline Models of Observational Data, CBMS-NSF Regional Conference Series in Applied Mathematics, Society 

of Industrial and Applied Mathematics, 1990. 
[25] M. Powell, The theory of radial basis function approximation in 1990, in: W. Light (Ed.), Advances in Numerical 

Analysis: Volume II: Wavelets, Subdivision Algorithms, and Radial Basis Functions, Oxford University Press, USA, 1992, 

pp. 105-210. 

[26] S. Wild, C. Shoemaker, Global convergence of radial basis function trust region derivative-free algorithms, SIAM Journal 

on Optimization 21 (2011) 761-781. 
[27] T. Gutzmer, J. Melenk, Approximation orders for natural splines in arbitrary dimensions, Mathematics of Computation 

70 (2001) 699-703. 

[28] R. Schaback, Improved error bounds for scattered data interpolation by radial basis functions, Mathematics of Compu- 
tation 68 (1999) 201-216. 

[29] J. Yoon, Lp-error estimates for shifted surface spline interpolation on Sobolev space, Mathematics of Computation 72 

(2003) 1349-1367. 

[30] S. Gangaputra, Handwritten digit database, |http: //cis ■ jhu. e du/~sachin/digit/digit .html 2012. 

[31] C. Fowlkes, S. Belongie, F. Chung, J. Malik, Spectral grouping using the Nystrom method, IEEE Transactions on Pattern 

Analysis and Machine Intelligence 26 (2004) 214-225. 
[32] R. Coifman, S. Lafon, Geometric harmonics: A novel tool for multiscale out-of-sample extension of empirical functions, 



26 



Applied and Computational Harmonic Analysis 21 (2006) 31-52. 



27 



